Efficient estimation of speaker-specific projecting feature transforms
نویسندگان
چکیده
This paper introduces a new, efficient approach for estimating projecting feature transforms for speech recognition. It is based on theMMI′ criterion, a likelihood ratio criterion motivated by a simplification of the MMI criterion, and is shown to be closely related to HLDA. In comparison to current methods, the new method is faster, making it more suitable for speaker adaptive training, where the number of speakers, and therefore the number of transforms are substantial. The proposed method was integrated into the RWTH parliamentary speeches transcription system. Experimental results are presented using speaker specific projecting transforms, both when used in recognition only and when used for speaker adaptive training, showing consistent improvements. Furthermore, the observed improvements are shown to be additive to the improvement of MLLR. Comparisons to DLT are presented, and results are presented for a new projecting DLT method.
منابع مشابه
A region-specific feature-space transformation for speaker adaptation and singularity analysis of jacobian matrix
In this paper, we present an in-depth analysis of a recently proposed method for speaker adaptation. The method involves a region-specific feature-space transformation, which we refer to as soft R-FMLLR. We argue that the method has certain difficulties, the most significant being the fact that it is noninvertible. An analysis that pertains to the singularity of the Jacobian matrix is presented...
متن کاملJoint environment and speaker normalization using factored front-end CMLLR
The problem of joint compensation of environment and speaker variabilities is addressed. A factored feature-space transform, named factored front-end CMLLR (F-FE-CMLLR), is investigated, which comprises of the cascade of two transforms – front-end CMLLR for environment normalization and CMLLR for speaker normalization. In this paper, we propose an iterative estimation algorithm for F-FE-CMLLR. ...
متن کاملAsynchronous factorisation of speaker and background with feature transforms in speech recognition
This paper presents a novel approach to separate the effects of speaker and background conditions by application of feature– transform based adaptation for Automatic Speech Recognition (ASR). So far factorisation has been shown to yield improvements in the case of utterance-synchronous environments. In this paper we show successful separation of conditions asynchronous with speech, such as back...
متن کاملFactored adaptation using a combination of feature-space and model-space transforms
Acoustic model adaptation can mitigate the degradation in recognition accuracy caused by speaker or environment mismatch. While there are many methods for speaker or environment adaptation, far less attention has been focused on methods that compensate for both simultaneously. We recently proposed an algorithm called factored adaptation which jointly estimates speaker and environment transforms...
متن کاملMaximum Likelihood Lineartransformations for Hmm
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias, strict linear feature-space transformations are inappropriate in this case. Hence, only model-b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007